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Abstract —Multicast transmission and wireless caching are 
effective ways of reducing air and backhaul traffic load in wireless 
networks. This paper proposes to incorporate these two key ideas 
for content-centric multicast transmission in a cloud radio access 
network (RAN) where multiple base stations (BSs) are connected 
to a central processor (CP) via finite-capacity backhaul links. 
Each BS has a cache with finite storage size and is equipped 
with multiple antennas. The BSs cooperatively transmit contents, 
which are either stored in the local cache or fetched from the CP, 
to multiple users in the network. Users requesting a same content 
form a multicast group and are served by a same cluster of BSs 
cooperatively using multicast beamforming. Assuming fixed cache 
placement, this paper investigates the joint design of multicast 
beamforming and content-centric BS clustering by formulating 
an optimization problem of minimizing the total network cost 
under the quality-of-service (QoS) constraints for each multicast 
group. The network cost involves both the transmission power 
and the backhaul cost. We model the backhaul cost using the 
mixed fo/f2-norm of beamforming vectors. To solve this non- 
convex problem, we first approximate it using the semidefinite 
relaxation (SDR) method and concave smooth functions. We then 
propose a difference of convex functions (DC) programming algo¬ 
rithm to obtain suboptimal solutions and show the connection of 
three smooth functions. Simulation results validate the advantage 
of multicasting and show the effects of different cache size and 
caching policies in cloud RAN. 

I. Introduction 

Cloud radio access network (RAN) is an emerging network 
architecture capable of exploiting the advantage of multicell 
cooperation in the future fifth-generation (5G) wireless system 
m . In a cloud RAN, the base stations (BSs) are connected to a 
central processor (CP) via digital backhaul links, thus enabling 
joint data processing and precoding capabilities across multi¬ 
ple BSs. This paper proposes a content-centric view for cloud 
RAN design. We equip the BSs with finite-size cache, where 
popular contents desired by multiple users can be stored. We 
formulate and solve a network optimization problem while 
accounting for the finite-capacity backhaul links between the 
BSs and the CP. 

To address the issue of limited backhaul, previous works on 
wireless cooperative networks m-ii consider the problem of 
minimizing the backhaul traffic and transmission power by 
designing sparse beamformer and user-centric BS clustering. 
Eurther, Q considers the weighted sum rate (WSR) optimiza¬ 
tion problem under per-BS backhaul constraints. However, all 
these works focus on the unicast scenario and promote a user¬ 
centric view of system design without considering the effect 


of caching. 

Recently, wireless caching has been investigated as an 
effective way of reducing peak traffic and backhaul load. 
By deploying caches at BSs and placing popular contents 
in them in advance, the issue of limited backhaul capacity 
can be addressed fundamentally. In lb), the authors show that 
with small or even no backhaul capacity, femto-caching can 
support high demand of wireless video distribution. In 121, 
the upper and lower bounds of the capacity of the caching 
system are derived, and it is shown that the network capacity 
could be further improved by using coded multicasting for 
content delivery. These studies motivate us to consider the 
cache-enabled cloud RAN, where each BS is equipped with 
a cache with finite storage size. Compared with cooperative 
networks without caching, cache-enabled cloud RAN can 
fundamentally reduce the backhaul cost and support more 
flexible BS clustering. We note that in IS), a similar wireless 
caching network has been considered, where the authors study 
the data assignment and unicast beamforming design. 

Different from previous work focused on unicast 0-11, 
0, where data is transmitted to each user individually no 
matter whether the actual contents requested by different users 
are the same or not, this paper focuses on the problem of 
multicast transmission. We assume that multiple users can 
request the same content, and the content is delivered using 
multicast beamforming to these users on the same resource 
block. Compared with traditional unicast, multicast can im¬ 
prove energy and spectral efficiency. In addition, since the 
popular contents cached in the BSs are possibly requested by 
multiple users, multicast could better exploit the potential of 
wireless caching. 

This paper studies the joint design of multicast beamforming 
and content-centric BS clustering, which differs from the 
fixed BS clustering in coordinated muticell multicast networks 
0 or user-centric BS clustering in unicast systems El. In 
each scheduling interval, the BS clustering is dynamically 
optimized with respect to each multicast group. We formulate 
an optimization problem with the objective of minimizing the 
total power consumption as well as the backhaul cost under the 
quality-of-service (QoS) constraints for each multicast group. 
The backhaul cost is formulated as a function of the mixed 
fo/^ 2 -norm of the beamforming vectors. The challenge in 
solving such a problem is due to both the non-convex QoS 
constraints and the £o-nonn in the backhaul cost. In this 
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paper, we first use the semidefinite relaxation (SDR) method 
introduced in ifTOll to handle the non-convex QoS constraints. 
We then adopt the smooth function approach in sparse signal 
processing to approximate the io-norm with concave smooth 
functions. 

In sparse signal processing, one approach to handle the 
^o-norm minimization problem is to approximate the io- 
norm with its reweighted ^i-norm IfTTI and update the weight 
factors iteratively. Another approach is the smooth function 
method a, where the authors use Gaussian family functions 
to approximate the (.Q-norm. The smooth function method is 
a better approximation to the ^o-nomi but its performance 
highly depends on the smoothness factor of the approximation 
function. In this paper, we adopt the smooth function approach 
and solve the approximated problem with a difference of 
convex functions (DC) algorithm na. We explore the use 
of three different smooth functions, the logarithmic function, 
the exponential function, and the arctangent function, and 
show that with a particular weight updating rule, all three 
are equivalent to the reweighted ^i-norm minimization im. 
Simulation results are presented to illustrate the performance 
of proposed algorithm and the benefit of wireless caching. 

Notations: Boldface uppercase letters denote matrices and 
boldface lowercase letters denote column vectors. The sets 
of complex numbers and binary numbers are denoted as 
C and B respectively. The statistical expectation, transpose 
and Hermitian transpose are denoted as E(-), (•)^ and (•)^ 
respectively. The Frobenius norm and the ^o-norm are denoted 
as 11-112 and || • ||o respectively. An all-one vector of length M 
is denoted as 1 m- An all-zero vector of length M is denoted 
as Om- The inner product of matrices X and Y is defined as 
{X,Y) = Tr{X^Y). For a square matrix Smxm, S 0 
means that S is positive semidefinite. 

II. System Model 

A. Signal Model 

We consider a cache-enabled cloud RAN with one CP, 
L BSs and K mobile users. Each BS is equipped with Nt 
antennas and each user has a single antenna. Scheduling and 
beamformer design are done at the CP. Each BS is connected 
to the CP via a finite-capacity backhaul link. The total number 
of contents is F; different contents are independent. The CP 
stores all the contents and there is a cache at each BS, which 
stores finite number of contents. Each user requests a content 
according to the content popularity, and users requesting the 
same content form a multicast group. We assume that the total 
number of multicast groups is M. The set of users in group 
m is denoted as ICm with \^m\ = K. 

We study the cooperative downlink multicast transmission 
and dynamic content-centric BS clustering. Each group m is 
served by a cluster of BSs cooperatively, denoted as Qm- 
In each scheduling interval, the BS clustering {Qm}m=i 
dynamically optimized by the CP. Eor example, in Pig. [T] 
the instantaneous BS clusters for different groups are Qi = 
{1,2,3}, Q 2 = {2} and Q 3 = {2,3}, respectively. Eor BS 
3, since it serves both group 1 and group 3, it should acquire 



Pig. 1: An example of downlink cloud RAN with M = 3 
groups and L = 3 cache-enabled BSs connected to a CP via 
digital backhaul links, where each multicast group is served 
by a cluster of BSs. 


the contents for these two groups either from its local cache 
or through backhaul. 

We denote the aggregate beamforming vector of group m 
from all BSs as Wm e • - • , 

where C is the beamforming vector for group 

m at BS 1. Note that the BS clustering is implicitly defined 
by the beamforming vectors. If the beamforming vector 
is OjVj, then BS I does not serve group m and is thus not 
in Qm- On the other hand, if tu; m 7 ^ Oat^, BS I is part 
of the serving cluster of group m. Thus, the size of the 
BS cooperation cluster for group m can be expressed as a 
mixed .( 0 /^ 2 -norm of the beamforming vector {ru; i-®- 

\Qm\ = ||ll«ti,,„||l||o. 

We denote the data symbol of the content requested by 
group m as Sm € C, with E [|.SmP] = 1- For user k G ICm, 
its received downlink signal yk can be written as 


M 

Uk — hk T ^ ^ hk '^n^n Zk, ( 1 ) 

n^m 


where hk G ^ ^ is the network-wide channel vector from 
all BSs to user k and Zk ~ CNf{0,a^) is the addictive noise. 

The received signal-to-interference-plus-noise ratio (SINR) 
at user k G ICm is 


SINRfe 


IhkWml^ 


E 


M 

n^m 


-P 0-2 


( 2 ) 


We define the target SINR vector as 7 = [ 71 , 72 , - - - , 7 m] 
with each element 7 ^ being the target SINR to be achieved 
by the users in group m. In this paper, we consider the fixed 
rate transmission as in El, where the transmission rate for 
group m is set as Rm = log 2 (l + 7m)- Thus, to successfully 
decode the message, for any user k G ICm, its achievable 
data rate should be larger than Rm, that is, for Vm, k G ICm, 

log 2 (l+SINRfe) ^Fm- 
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B. Cache Model 


A. Problem Formulation 


We assume that each content has normalized size of 1 and 
the local storage size of BS I is Fi {Fi < F), which is also 
the maximum number of contents it can store. Therefore, we 
define a cache placement matrix C G where c; / = 1 

means the content / is cached in BS I and cij = 0 means the 
opposite. Note that V/, ^ 

We assume that the cache placement is static, that is, matrix 
C is fixed and is known at the CP (similar assumptions have 
been made in previous literature, e.g., ID). This assumption 
is based on the fact that the optimization of cache placement 
is performed on a large time scale; while the beamforming 
design is done on a small time scale of channel coherence 
time. Hence, it is reasonable to assume that during a short 
scheduling interval, the cache placement policy remains un¬ 
changed. 

C. Cost Model 

We consider the total network cost which consists of both 
the transmission power and the backhaul cost. Let fm denote 
the content requested by users in multicast group m. For BS I 
in Qm, if content fm is in its cache, it can access the content 
directly without costing backhaul. On the contrary, if content 
fm is not cached, BS I needs to first fetch this content from 
the CP via the backhaul link. Since the data rates of fetching 
the contents from the CP need to be as large as the content- 
delivery rate, the backhaul cost in this case is modeled as the 
transmission rates of multicast groups. 

The total backhaul cost at all BSs can be written as 

M L 

Cs = X] X/ l|ll^*.™ll2||o'^m(l “ (3) 

m—1 1—1 

The total network cost can be written as 

M ML 

Cn = V ||'Utm||2 + II ll'^^fm||2||n^™(3 ~ 

m—1 m—1 1—1 

" ---^ -V-' 

Power Consumption Backhaul Cost 

(4) 

where ry is a weight parameter. By adjusting the value of rj, 
we can emphasize on one cost versus the other. 

Note that in a network without caching, there is a tradeoff 
between power and backhaul cost. To reduce power consump¬ 
tion, each group can be served by more BSs, which increases 
backhaul cost. However, in cache-enabled cloud RAN, for each 
group, the BSs caching the requested content can be involved 
in the cooperative cluster of the group without costing extra 
backhaul. 

HI. Problem Formulation and Approximation 

In this section, we present the optimization problem of min¬ 
imizing the total network cost by jointly designing multicast 
beamforming and BS clustering. We show that this problem 
is a non-convex problem and further approximate it with two 
steps. 


Our objective is to minimize the total network cost, under 
the constraints of the peak transmission power at each BS and 
the SINR requirement of each multicast group. 

The optimization problem is formulated as 


minimize 

Cn 

(5a) 



subject to 

SINRfc > jm,Vm, k G Km 

(5b) 


M 



Y, \\wi,m\\l^Pl,Vl 

(5c) 


m—1 


where Pi is the peak transmission power at BS 1. 

Problem Vq is a non-convex problem, where the non¬ 
convexity comes from both the £o-norm in the objective 
function and the SINR constraints. Unlike traditional unicast 
beamforming problem where the non-convex SINR constraints 
can be transformed to a second-order cone programming 
(SOCP) problem and the optimal solutions can be obtained 
with convex optimization, the multicast beamforming problem 
is NP-hard ifTOll . In this paper, we use two techniques to 
approximate problem Vq, namely SDR relaxation and ig-norm 
approximation. The overall procedure to solve Vo is shown in 
Fig. m with each step elaborated in following sections. 


B. Step 1 - SDR Relaxation 

In both single-cell cni and multicell a scenarios, the 
semidefinite relaxation (SDR) method has been used to deal 
with the non-convex SINR constraints in multicast beamform¬ 
ing design problems. 

We define two sets of matrices {Wm G 
and {Hk G 

Wm = WmWm and Hk = hkh^, (6) 


We further define a set of selecting matrices {Ji\\Gi, where 
each matrix J/ G is a diagonal matrix defined as 


J; = diag 
Therefore, we have 


pH 


)■ 


VI. 


(7) 


\\wi,m\\l=Tr{WmJi),Vl,ni. (8) 


By adopting the SDR method, problem Vq can be relaxed 
and rewritten as 


'PsDR: 

minimize 

subject to 


M 


M L 


m—1 1 — 1 


riTT{Wm) + ^ ^ai,^||Tr(W™Jz) ||o 

(9a) 

^ Tjti 5 Vm , k G Kim 

(9b) 


Tv(WmHk) 


M 


J2MWmJl)^Pl, VI 


m—1 

Wm h 0, Vm 


(9c) 

(9d) 
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where, to further simplify the mathematical representation, we 
have defined a set of constants ai^m = 

Rm (1 ~ 

The SINR constraints in problem Vs dr are convex. We 
denote the resulting optimal {Wm} after solving problem 
VsDR as {fy*}. If w* is already rank-one, then for group 
m the optimal aggregate beamformer of problem Vq can 
be obtained by applying eigen-value decomposition to as 
= X^Wm'w^ and taking Otherwise, 

the beamforming vectors {wm} can be generated with the 
randomization method used in ||9] and ifTOl . 


C. Step 2 - io-norm Approximation 

To solve the problem Vs dr, we further approximate the 
non-convex ^Q-norm in the objective with a continuous func¬ 
tion denoted as f{x). Specifically, we consider three frequently 
used smooth functions: logarithmic function, exponential func¬ 
tion and arctangent function ina, defined as 

{ log ^ for log-function 

1 —e for exp-function (10) 

^ arctan ^ , for atan-function 

where 0 is a parameter to adjust the smoothness of the 
functions. In all three cases, with larger 0, the function is 
smoother but is a worse approximation to the ^g-norm. 

In a, the authors use the Gaussian family smooth func¬ 
tions, where the ^g-norm of ||ru ||2 is approximated with 
/(||i(j|| 2 ) = 1 — exp(^ll^). In this work, by adopting the 
exponential smooth function, the £g-norm is approximated 
with fexpiW) = 1 - exp( ). Comparing /(||iu|| 2 ) and 

fexp(W), we can see that the exponential smooth function in 
( [Tol l has the same approximation effect as the Gaussian smooth 
function in 0. 

With smooth function. Vs dr can be rewritten as 


VsF : 


minimize 
subject to 


M 


M L 


Y, 77 Tr(W™)+ ^ ( 11 a) 


m—1 


m—1 1—1 


(HQ,®,®. 


(lib) 


For ease of presentation, we express the objective as the 
summation of two functions G{W) and F{W), defined as 

M ML 

G{W) = Y and F{W) = ^ ^ a;, 

m—1 m—1 1—1 

( 12 ) 

where f^rn = f {WmJi) 

We see that G{W) and F{W) are an affine and a concave 
function of W, respectively, so problem Vsf can be viewed 
as the difference of two continuous convex functions with 
convex constraints. Therefore, this problem can be solved with 
the DC algorithm, which falls in the category of majorization 
minimization (MM) algorithms (El. 



Fig. 2: Overall procedure for solving problem Vq 


IV. DC Based Algorithm and Analysis 

In this section, we first present the DC based algorithm 
for solving the problem Vsf using the logarithmic smooth 
function. We then show that the resulting algorithm can also 
be interpreted as a DC algorithm with the other two smooth 
functions and a different 0 updating rule. 


A. DC Based Algorithm with Log-Function 

The DC algorithm iteratively optimizes an approximated 
convex function of the original concave objective function 
and produces a sequence of improving {Wm}- The algorithm 
converges to some global/local optimal solution. 

The initial {Wm^} is found by solving the following power 
minimization problem 

M 

Vim- Vini = minimize y^Tr(Wm) (13a) 

L Jm —i m—1 

subject to (l9bli. (l9^. (l9dli. (13b) 

In the f-th iteration, is generated as the solution of 

the approximated convex optimization problem. 


Vt ■ Vt = minimize GiW) 

M L 

+ E E , {Wm - ) 

m—1 1—1 

(14a) 

subject to (l9bb. (|9^. (l9dli. (14b) 

where V^(t-i)/i^m is the gradient matrix of at 

Specifically, for log-function, the gradient matrix 
^Wr^Am € at {wi‘)} is 





(15) 


If we let the smoothness factor Oi^m = e, where e is a very 
small positive constant. Then we have 


R^log ((; m) 


Jl 


Tr 




+ e 


(16) 
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Note that ( fTSI l has the similar form as the weight factor 
of the reweighted £i-norm approach in O . Thus, the DC 
algorithm with log-function and smoothness factor 0 

is just the reweighted £i-norm minimization algorithm of 0. 

In ( I14ab . function F{W) is approximated with its first- 
order Taylor expansion, which provides an upper bound. This 
algorithm terminates when the sequence of {Wm'^} converges 
to some stationary point, and the objective value Vj converges, 
that is, Vt-i — Vt < g, where p is a small constant. 


B. Updating Rule of 6 for Other Smooth Functions 

The performance of fo-nomi approximation algorithms de¬ 
pends on the smoothness factor 0. Intuitively, when x is large, 
0 should be large so that the approximation algorithm can 
explore the entire parameter space; when x is small, 0 should 
be small so that f{x) has behavior close to iQ-norm. In iflTll . 
the authors propose to use a decreasing sequence of 0, but the 
updating rule does not depend on x. 

In this paper, we explore a novel 0 updating rule that 
achieves the above effect automatically using a sequence 
of 0 which depends on specific x in each iteration. More 
specifically, we propose to set 0 to be the one that maximizes 
the gradient of the approximation function. 

Note that the gradient matrices of the exponential and 
arctangent functions in (fTOt are, respectively, 

J T,(W,„J,) 

Dexp{l,m) = - — e , (17) 


and 




Jl 


Oi.' 


I, 


(18) 


+ 


l,m 


An interesting observation is that for all three functions in 
Coll, if we maximize their gradients, we get expressions of the 
same form. Specifically, for the log-function, we get (flbl l with 
optimal 6*^ 0. For the exp-function and atan-function, we 

get 

Jl 


d: 


= maxDf, 




e ■ Tr(wi‘^ JO 


(19) 


D*atan (0 «) = max Datan ifm)= ---, (20) 

7r-Tr(Wrj) 

respectively with the optimal 0 *^ = Tr(W,^^ J;). 

We see that the gradient matrices for all three approximation 
functions in (fTOl i differ by a constant multiple only. Therefore, 
they lead to the same algorithm if we update such that 
is maximized in the f-th iteration, i.e., 

- argmaxV (21) 

In this proposed algorithm, the approximation functions 
are adjusted dynamically to achieve a good tradeoff between 
smoothness and approximation to fo-norm. Further, similarity 
between CSll, CHl, and (l20l i means that with proposed 9 
updating rule, these three approximation functions lead to 
almost the same performance. 


V. Simulation Results 

We consider a cache-enabled cloud RAN covering an area of 
circle with the radius of 1.2km, where 7 BSs (L = l,Nt = 3) 
are placed in a equilateral triangular lattice with the distance 
between adjacent BSs of 0.8km. The total number of contents 
is F = 100. A total number of 140 users are distributed in 
this network with uniform distribution and they are scheduled 
in a round-robin manner. In each scheduling interval, AT = 14 
users are scheduled. We assume half of the scheduled users 
request a common content (e.g., a live video) and each 
of the rest randomly requests one content according to the 
content popularity, which is modeled as Zipf distribution with 
skewness parameter 1. Users requesting the same content 
participate in the same multicast group. We assume all BSs 
have the same cache size. The channels between BSs and users 
are generated with a normalized Rayleigh fading component 
and a distance-dependent path loss, modeled as PL{dB)= 
148.1-f 37.6 log]^g((i) with 8 dB log-normal shadowing, where 
d is the distance from the user to the BS. The transmit 
antenna power gain at each BS is 10 dBi. The power spectral 
density of downlink noise is — 172dBm/Hz with the channel 
bandwidth of lOMHz. The peak transmission power of each 
BS is Pi = low, VL The target SINR of each content is lOdB. 
We set g = 10“® for the convergence condition in the DC 
algorithm and e = 10“^ in (flbl l. Each simulation result is 
averaged over 300 scheduling intervals. 

In Fig. 12 we show the effects of wireless caching and the 
cache size. The popularity-aware cache refers to the policy 
where each BS caches the contents with highest popularity. 
The figure shows that compared with the network without 
cache, the cache-enabled network can reduce the backhaul cost 
by more than 50% when each BS only caches 5% of the total 
contents. The backhaul reduction is up to 75% when each BS 
can cache 30% of the total contents. 

In Fig. m we compare the effects of different cache strate¬ 
gies. In the random cache policy, the contents are randomly 
cached with equal probability in each BS. The result shows 
that the popularity-aware cache has better power-backhaul 
efficiency in general. In specific, with the same transmission 
power cost of 38dBm, and the same cache size of 10, 
popularity-aware cache only costs about 1/4 backhaul cost of 
the random cache policy. Flowever, in the extreme case when 
we do not consider the total transmission power, the minimum 
backhaul costs of the two caching strategies are about the same 
if the cache size is 30. 

In Fig. 12 we compare the power-backhaul tradeoff of 
multicast and unicast transmission in the same scenario. We 
use popularity-aware caching policy with the cache size of 10 . 
In unicast transmission, users of the same group are served by 
different beamformers and we use the algorithm proposed in 
a, where in each iteration, for Vfc, I, the weight factor p/ is 
set to 0 if the content requested by user k is cached in BS 
1. If multiple users in the same group are served by a same 
BS, the backhaul cost is counted only once. The figure shows 
that in the power-limited system, the total power consumption 
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Total Power Consumption (dBm) 

Fig. 4: Power-backhaul tradeoff for different cache policies. 

of multicast transmission is about 2dB less than the unicast 
scenario. When the total power consumption is 38dBm, the 
backhaul cost of multicast transmission is only 1/3 of unicast 
transmission. These observations validates the advantage of 
caching and multicasting in such a scenario. 

VI. Conclusion 

This paper investigates the joint design of multicast beam¬ 
forming and content-centric BS clustering in a cache-enabled 
cloud RAN. The optimization problem is formulated as the 
minimization of the total network cost, including the power 
consumption and the backhaul cost, under the QoS constraint 
of each multicast group. We adopt the SDR method and 
the smooth function approach, introduced in sparse signal 
processing, to approximate this non-convex problem as a 
DC programming problem. We then propose a DC algorithm 
to obtain sub-optimal solutions. Further, we propose a new 
smoothness updating method and give insight into its connec¬ 
tion to reweighted ^i-norm minimization. Simulation results 
show that, compared with unicast transmission, multicast 
transmission can achieve better power-backhaul tradeoff, and 



Fig. 5; Power-backhaul tradeoff for multicast and unicast transmis¬ 
sion. 

the backhaul cost can be further reduced with larger cache 
size. 
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